[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Imageto converts an image file (currently either in portable bitmap format (PBM) or GEM’s IMG format) to either a bitmap font or an Encapsulated PostScript file (EPSF). An image file is simply a large bitmap.
If the output is a font, it can be constructed either by outputting a constant number of scanlines from the image as each “character” or (more usually) by extracting the “real” characters from the image.
The current selection of input formats is rather arbitrary. We implemented the IMG format because that is what our scanner outputs, and the PBM format because Ghostscript can output it (@pxref{GSrenderfont}). Other formats could easily be added.
1.1 Imageto usage | Process for extracting fonts from an image. | |
1.2 IFI files | IFI files supply extra information. | |
1.3 Invoking Imageto | Command-line options. |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Usually there are two prerequisites to extracting a usable font from an image file. First, looking at the image, so you can see what you’ve got. Second, preparing the IFI file describing the contents of the image: the character codes to output, any baseline adjustment (as for, e.g., ‘j’), and how many pieces each character has. Each is a separate invocation of Imageto; the first time with either the ‘-strips’ or ‘-epsf’ option, the second time with neither.
In the second step, Imageto considers the input image as a series of image rows. Each image row consists of all the scanlines between a nonblank scanline and the next entirely blank scanline. (A scanline is a single horizontal row of pixels in the image.) Within each image row, Imageto looks top-to-bottom, left-to-right, for bounding boxes: closed contours, i.e., an area whose edge you can trace with a pencil without lifting it.
For example, in the following image Imageto would find two image rows, the first from scanlines 1 to scanline 7, the second consisting of only scanline 10. There are six bounding boxes in the first image row, only one in the second. (This example also shows some typical problems in scanned images: the baseline of the ‘m’ is not aligned with those of the ‘i’, ‘j’, and ‘l’; a meaningless black line is present; the ‘i’ and ‘j’ overlap.)
01234567890123456789 0 1 x 2 x x x 3 x 4 x x x xxxxx 5 x x x x x x 6 x x x x 7 xx 8 9 10 xxxxxxxxxxxxxxx
1.1.1 Viewing an image | Seeing what’s in an image. | |
1.1.2 Image to font conversion | Extracting a font. | |
1.1.3 Dirty images | Handling scanning artifacts or other noise. |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Typically, the first step in extracting a font from an image is to see exactly what is in the image. (Clearly, this is unnecessary if you already know what your image file contains.)
The simplest way to get a look at the image file, if you have Ghostscript or some other suitable PostScript interpreter, is to convert the image file into an EPSF file with the ‘-epsf’ option. Here is a possible invocation:
imageto -epsf ggmr.img
Here we read an input file ‘ggmr.img’; the output is ‘ggmr.eps’. You can then view the EPS file with
gs ggmr.eps
(presuming that ‘gs’ invokes your PostScript interpreter).
If you don’t have both a suitable PostScript interpreter and enough disk space to store the EPS file (it uses approximately twice as much disk space as the original image), the above won’t work. Instead, to view the image you must make a font with the ‘-strips’ option:
imageto -strips ggmr.img
The output of this will be ‘ggmrsp.1200gf’ (our image having a resolution of 1200 dpi). Although the GF font cannot be conveniently viewed directly, you can use TeX and your favorite DVI processor to look at it, as follows:
fontconvert -tfm ggmrsp.1200 echo ggmrsp | tex strips
This outputs in ‘strips.dvi’, which you can view with your favorite DVI driver. (@xref{Archives}, for how to obtain the DVI drivers for PostScript and X we recommend.)
‘strips.tex’ is distributed in the ‘imageto’ directory.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Once you can see what is in the image, the next step is to prepare the IFI file (see section IFI files) corresponding to its characters. Imageto relies completely on the IFI files to describe the image; it makes no attempt at optical character recognition, i.e., guessing what the characters are from their shapes.
You must also decide on a few more aspects of the output font, which you specify with options:
For instance, in the example image in Imageto usage, it would be best to specify ‘-baselines=2,0’. The ‘2’ is scanline #5 in that image. The ‘0’ is an arbitrary value for scanline #10, which we will ignore via the IFI file (see section IFI files).
For each character written, the ‘-print-guidelines’ option produces output on the terminal that looks like:
75 (K) 5/315
This means that character code 75, whose name in the encoding file is ‘K’, has its bottom row at row 5, and its top row at row 315; i.e., the character has five blank rows above the origin. This is almost certainly wrong (the letter ‘K’ should sit on the typesetting baseline), so we would want to adjust it downwards to 0 via the individual character adjustment (see section IFI files).
The final invocation to produce the font might look something like this:
imageto -baselines=121,130,120 -designsize=26 ggmr
The output from this would be ‘ggmr26.1200gf’.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Your image may not be completely “clean”, i.e., the scanning process may have introduced artifacts: black lines at the edge of the paper; blotches where the original had a speck of dirt or ink; broken lines where the image had a continuous line. To get a correct output font, you must correct these problems.
To remove blotches, you can simply put .notdef
in the appropriate
place in the IFI file. You can find the “appropriate place” when you
look at the output font; some character will be nothing but a (possibly
tiny) speck, and all the characters following will be in the wrong
position.
The ‘-print-clean-info’ option might also help you to diagnose which bounding boxes are being assigned to which characters, when you are in doubt. Here is an example of its output:
[Cleaning 149x383 bitmap: checking (0,99)-(10,152) ... clearing. checking (0,203)-(35,263) ... clearing. checking (0,99)-(130,382) ... keeping. checking (113,0)-(149,37) ... keeping. 106]
The final ‘106’ is the character code output (ASCII ‘j’). The size of the overall bitmap which contains the ‘j’ is 149 pixels wide and 383 pixels high. The bitmap contained four bounding boxes, the last two of which belonged to the ‘j’ and were kept, and the first two from the adjacent character (‘i’) and were erased. (As shown in the example image above, the tail of the ‘j’ often overlaps the ‘i’ in type specimens.)
If the image has blobs you have not removed with .notdef
, you
will see a small bounding box in this output. The numbers shown are in
“bitmap coordinates”: (0,0) is the upper left-hand pixel of the
bitmap.
If a blotch appears outside of the row of characters, Imageto will consider it to be its own (very small) image row. If you are using ‘-baselines’, you must specify an arbitrary value corresponding to the blotch, even though the bounding box in the image will be ignored. See the section above for an example.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
An image font information (IFI) file is a text file which describes the contents of an image file. You yourself must create it; as we will see, the information it contains usually cannot be determined automatically.
If your image file is named ‘foo.img’ (or ‘foo.pbm’), it is customary to name the corresponding IFI file ‘foo.ifi’. That is what Imageto looks for by default. If you name it something else, you must specify the name with the ‘-ifi-file’ option.
Imageto does not look for an IFI file if either the ‘-strips’ or ‘-epsf’ options were specified.
Each nonblank non-comment line in the IFI file represents a a sequence of bounding boxes in the image, and a corresponding character in the output font. @xref{Common file syntax}, for a description of syntax elements common to all data files processed by these programs, including comments.
Each line has one to five entries, separated by spaces and/or tabs. If a line contains fewer than five entries, suitable defaults (as described below) are taken for the missing trailing entries. (It is impossible to supply a value for entry #3, say, without also supplying values for entries #1 and #2.)
Here is the meaning of each entry, in order:
.notdef
, or if the character name is not specified in the
encoding, Imageto just throws away the bounding boxes. @xref{Encoding
files}, for general information on encoding files.
-2
.
You can run Charspace (@pxref{Charspace}) to add side bearings to a font semi-automatically. This is usually less work than trying to guess at numbers here.
Here is a possible IFI file for the image in Imageto usage. We throw away the black line that is the second image row. (Imagine that it is a scanner artifact.)
% IFI file for example image. i 0 2 j 0 2 l m 1 .notdef % Ignore the black line at the bottom.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section describes the options that Imageto accepts. @xref{Command-line options}, for general option syntax.
The main input filename (@pxref{Main input file}) is called image-name below.
Define the baselines for each image row. The default baseline for the characters in the first image row is taken to be scanline1, etc. The scanlines are not cumulative: the top scanline in each image row is numbered zero.
Set the design size of the output font to real; default is 10.0.
The resolution of the input image, in pixels per inch (required for PBM input). @xref{Common options}.
The encoding file to read for the mapping between character names and character codes. @xref{Encoding files}. If enc-file has no suffix, ‘.enc’ is appended. Default is to assign successive character codes to the character names in the IFI file.
Write the image to ‘image-name.eps’ as an Encapsulated PostScript file.
Print a usage message. @xref{Common options}.
Set the name of the IFI file to filename (if filename has an extension) or ‘filename.ifi’ (if it doesn’t). The default is ‘image-name.ifi’.
Specify the format of the input image; format must be one of ‘pbm’ or ‘img’. The default is taken from image-name, if possible.
Only write the first unsigned (approximately) characters from the image to the output font; default is all the characters.
Write to filename if filename has a suffix. If it doesn’t, then if writing strips, write to filenamesp.dpigf; else write to ‘filename.dpigf’. By default, use ‘image-name designsize’ for filename.
Print the size of each bounding box considered for removal, and the size of the containing bitmaps. This option implies ‘-verbose’. See section Dirty images, for a full explanation of its output.
Print the numbers of the top and bottom scanlines for each character. This implies ‘verbose’. See section Image to font conversion, for a full explanation of its output.
Only output characters with codes between char1 and char2, inclusive. (@xref{Common options}, and @ref{Specifying character codes}.)
Take a constant number of scanlines from the image as each character in the output font, instead of using an IFI file to analyze the image.
Show every scanline as we read it as plain text, using ‘*’ and space characters. This is still another way to view the image (see section Viewing an image), but the result takes an enormous amount of disk space (over eight times as much as the original image) and is quite difficult to look at (because it’s so big). To be useful at all, we start a giant XTerm window with the smallest possible font and look at the resulting file in Emacs. This option is primarily for debugging.
Output progress reports. @xref{Common options}. Specifically, a ‘.’ is output for every 100 scanlines read, a ‘+’ is output when an image row does not end on a character boundary, and the character code is output inside brackets.
Print the version number. @xref{Common options}.
[Top] | [Contents] | [Index] | [ ? ] |
This document was generated on November 5, 2024 using texi2html 5.0.
The buttons in the navigation panels have the following meaning:
Button | Name | Go to | From 1.2.3 go to |
---|---|---|---|
[ << ] | FastBack | Beginning of this chapter or previous chapter | 1 |
[ < ] | Back | Previous section in reading order | 1.2.2 |
[ Up ] | Up | Up section | 1.2 |
[ > ] | Forward | Next section in reading order | 1.2.4 |
[ >> ] | FastForward | Next chapter | 2 |
[Top] | Top | Cover (top) of document | |
[Contents] | Contents | Table of contents | |
[Index] | Index | Index | |
[ ? ] | About | About (help) |
where the Example assumes that the current position is at Subsubsection One-Two-Three of a document of the following structure:
This document was generated on November 5, 2024 using texi2html 5.0.